Back
CS336 Assignment 2: profile and benchmark the model, implement FlashAttention-2 in Triton, and build distributed data parallel with optimizer sharding.
language models
cs336
triton
distributed training
notes